sed
[24] and awk
[25] code.
We quickly became unsatisfied with them for various reasons.
One of the concerns was efficiency.
Interpreted languages such as sed
and awk
are satisfactory for
rapid prototyping, but they are slow.
In our experience, MakeIndex was able to process
a book index of 3,300 entries in less
than 50 seconds of user time plus an extra 5% of system time on
a client node of SUN 3/50 (MC68020 at 12.5 MHz).
This is at least an order of magnitude faster than
using its most recent sed
/awk
predecessor, which has only
half the features of MakeIndex.
The decision to do a C implementation was made because of C's dynamic
storage allocation and data structures.
Since we wanted as general a solution as possible, it would be difficult,
if not impossible, to implement all the desired features using sed/awk
.
For instance, in awk
a dynamically linked list of index entries is
impossible, and the style handling mechanism with a comprehensive error
processing facility is difficult to realize. Originally developed
in UNIX C with portability in mind, MakeIndex also runs on VAX/VMS,
TOPS-20, MS/DOS systems with a few trivial changes.
This portability would have
been very difficult for an implementation in UNIX shell scripts.
Furthermore, MakeIndex has more features and is more extensible than tools like
IdxTEX [26], a LATEX-specific index processor
that runs only on VAX/VMS systems.
Our approach is the direct opposite of that taken by make.index
,
a host of indexing tools [23] built for troff.
As part of the UNIX troff document
preparation environment, these tools are canonical examples of
the pipelining model. They are a collection of small awk
programs working together in a very long pipeline.
The claim is that by breaking the system down into small pieces, it is
easier for the user to adapt the package to various situations
not possible to envision in advance. Their approach, therefore, seems
to follow the ``do it yourself'' metaphor whereby you get a toolkit
and assemble the final product yourself. The basic package covers
only those features used by its authors. A third party user has to
modify their awk
code if something different is desired.
The design of MakeIndex is quite different. Our intention is to build a complete system by analyzing the tasks involved in index processing. Parameters are derived to form a table-driven style handling facility. Formatter and format independence is achieved by simple style specification. All the underlying data structures and processing mechanisms are transparent. Yet it is robust enough to handle different precedence schemes, ordering rules, etc. To use the system, the user needs not deal with the processing details. If the default is inappropriate, the only adaptation required is a table specification for the style facility.
For instance, by assigning the output index header index.head
defined in make.index
to our preamble
and
other related commands to item_
i's, it is easy to produce
an alphabetized index in troff format from a raw index generated by troff.
The same technique applies to other formatting systems.
Scribe, for example, has an indexing subsystem
that supports a subset of the functionality described above.
By adapting its raw index file format as input style and
its output appearance commands as output style, its indexing capability
can be readily expanded using MakeIndex. In both cases,
there is no need to modify the original code used to generate
the raw index (Step II), nor is it necessary to modify MakeIndex itself.
Likewise, MakeIndex can be integrated with direct-manipulation
document development or publishing systems by binding their descriptive
markup tags to the attributes of MakeIndex's input format and output style.
To summarize the differences in approach between MakeIndex and make.index
,
let us examine the points of comparison: speed, size of the
programs, ease of use, portability, and extensibility.
MakeIndex is an order of magnitude faster. The awk
programs
of make.index
are less than 100 lines of code, while MakeIndex is large:
6,000 lines of C code and a binary of 70K bytes. MakeIndex is a self-contained
C program, adapted from our original UNIX implementation, to run on
a variety of machines from micros to mainframes. The make.index
system will run on any UNIX system which has awk
so it too spans
many systems, although it is UNIX-specific while MakeIndex is not.
MakeIndex is easy to use. It is monolithic and covers almost every conceivable
case likely to occur. We actually examined many of the books in the
Addison-Wesley Computer Science Series and found that more than 95% of the
cases could be handled by the program. The make.index
suite, on the
other hand, handles the standard cases and all customizations must be done
by the user by modifying awk
code. With MakeIndex, user changes will
very rarely occur. If they should occur, and we have never had them happen,
then the user must change the C source code. The question as to whether it is
easier to modify awk
code or C code is left to the reader.
Our answer is given in MakeIndex.